126 research outputs found

    Mean-Field Theory of Meta-Learning

    Full text link
    We discuss here the mean-field theory for a cellular automata model of meta-learning. The meta-learning is the process of combining outcomes of individual learning procedures in order to determine the final decision with higher accuracy than any single learning method. Our method is constructed from an ensemble of interacting, learning agents, that acquire and process incoming information using various types, or different versions of machine learning algorithms. The abstract learning space, where all agents are located, is constructed here using a fully connected model that couples all agents with random strength values. The cellular automata network simulates the higher level integration of information acquired from the independent learning trials. The final classification of incoming input data is therefore defined as the stationary state of the meta-learning system using simple majority rule, yet the minority clusters that share opposite classification outcome can be observed in the system. Therefore, the probability of selecting proper class for a given input data, can be estimated even without the prior knowledge of its affiliation. The fuzzy logic can be easily introduced into the system, even if learning agents are build from simple binary classification machine learning algorithms by calculating the percentage of agreeing agents.Comment: 23 page

    3D-Fun: predicting enzyme function from structure

    Get PDF
    The ‘omics’ revolution is causing a flurry of data that all needs to be annotated for it to become useful. Sequences of proteins of unknown function can be annotated with a putative function by comparing them with proteins of known function. This form of annotation is typically performed with BLAST or similar software. Structural genomics is nowadays also bringing us three dimensional structures of proteins with unknown function. We present here software that can be used when sequence comparisons fail to determine the function of a protein with known structure but unknown function. The software, called 3D-Fun, is implemented as a server that runs at several European institutes and is freely available for everybody at all these sites. The 3D-Fun servers accept protein coordinates in the standard PDB format and compare them with all known protein structures by 3D structural superposition using the 3D-Hit software. If structural hits are found with proteins with known function, these are listed together with their function and some vital comparison statistics. This is conceptually very similar in 3D to what BLAST does in 1D. Additionally, the superposition results are displayed using interactive graphics facilities. Currently, the 3D-Fun system only predicts enzyme function but an expanded version with Gene Ontology predictions will be available soon. The server can be accessed at http://3dfun.bioinfo.pl/ or at http://3dfun.cmbi.ru.nl/

    AMS 3.0: prediction of post-translational modifications

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>We present here the recent update of AMS algorithm for identification of post-translational modification (PTM) sites in proteins based only on sequence information, using artificial neural network (ANN) method. The query protein sequence is dissected into overlapping short sequence segments. Ten different physicochemical features describe each amino acid; therefore nine residues long segment is represented as a point in a 90 dimensional space. The database of sequence segments with confirmed by experiments post-translational modification sites are used for training a set of ANNs.</p> <p>Results</p> <p>The efficiency of the classification for each type of modification and the prediction power of the method is estimated here using recall (sensitivity), precision values, the area under receiver operating characteristic (ROC) curves and leave-one-out tests (LOOCV). The significant differences in the performance for differently optimized neural networks are observed, yet the AMS 3.0 tool integrates those heterogeneous classification schemes into the single consensus scheme, and it is able to boost the precision and recall values independent of a PTM type in comparison with the currently available state-of-the art methods.</p> <p>Conclusions</p> <p>The standalone version of AMS 3.0 presents an efficient way to indentify post-translational modifications for whole proteomes. The training datasets, precompiled binaries for AMS 3.0 tool and the source code are available at <url>http://code.google.com/p/automotifserver</url> under the Apache 2.0 license scheme.</p

    Multi-scale phase separation by explosive percolation with single-chromatin loop resolution.

    Get PDF
    The 2 m-long human DNA is tightly intertwined into the cell nucleus of the size of 10 μm. The DNA packing is explained by folding of chromatin fiber. This folding leads to the formation of such hierarchical structures as: chromosomal territories, compartments; densely-packed genomic regions known as Topologically Associating Domains (TADs), or Chromatin Contact Domains (CCDs), and loops. We propose models of dynamical human genome folding into hierarchical components in human lymphoblastoid, stem cell, and fibroblast cell lines. Our models are based on explosive percolation theory. The chromosomes are modeled as graphs where CTCF chromatin loops are represented as edges. The folding trajectory is simulated by gradually introducing loops to the graph following various edge addition strategies that are based on topological network properties, chromatin loop frequencies, compartmentalization, or epigenomic features. Finally, we propose the genome folding model - a biophysical pseudo-time process guided by a single scalar order parameter. The parameter is calculated by Linear Discriminant Analysis of chromatin features. We also include dynamics of loop formation by using Loop Extrusion Model (LEM) while adding them to the system. The chromatin phase separation, where fiber folds in 3D space into topological domains and compartments, is observed when the critical number of contacts is reached. We also observe that at least 80% of the loops are needed for chromatin fiber to condense in 3D space, and this is constant through various cell lines. Overall, ou

    Species Used for Drug Testing Reveal Different Inhibition Susceptibility for 17beta-Hydroxysteroid Dehydrogenase Type 1

    Get PDF
    Steroid-related cancers can be treated by inhibitors of steroid metabolism. In searching for new inhibitors of human 17beta-hydroxysteroid dehydrogenase type 1 (17β-HSD 1) for the treatment of breast cancer or endometriosis, novel substances based on 15-substituted estrone were validated. We checked the specificity for different 17β-HSD types and species. Compounds were tested for specificity in vitro not only towards recombinant human 17β-HSD types 1, 2, 4, 5 and 7 but also against 17β-HSD 1 of several other species including marmoset, pig, mouse, and rat. The latter are used in the processes of pharmacophore screening. We present the quantification of inhibitor preferences between human and animal models. Profound differences in the susceptibility to inhibition of steroid conversion among all 17β-HSDs analyzed were observed. Especially, the rodent 17β-HSDs 1 were significantly less sensitive to inhibition compared to the human ortholog, while the most similar inhibition pattern to the human 17β-HSD 1 was obtained with the marmoset enzyme. Molecular docking experiments predicted estrone as the most potent inhibitor. The best performing compound in enzymatic assays was also highly ranked by docking scoring for the human enzyme. However, species-specific prediction of inhibitor performance by molecular docking was not possible. We show that experiments with good candidate compounds would out-select them in the rodent model during preclinical optimization steps. Potentially active human-relevant drugs, therefore, would no longer be further developed. Activity and efficacy screens in heterologous species systems must be evaluated with caution

    PSP_MCSVM: brainstorming consensus prediction of protein secondary structures using two-stage multiclass support vector machines

    Get PDF
    Secondary structure prediction is a crucial task for understanding the variety of protein structures and performed biological functions. Prediction of secondary structures for new proteins using their amino acid sequences is of fundamental importance in bioinformatics. We propose a novel technique to predict protein secondary structures based on position-specific scoring matrices (PSSMs) and physico-chemical properties of amino acids. It is a two stage approach involving multiclass support vector machines (SVMs) as classifiers for three different structural conformations, viz., helix, sheet and coil. In the first stage, PSSMs obtained from PSI-BLAST and five specially selected physicochemical properties of amino acids are fed into SVMs as features for sequence-to-structure prediction. Confidence values for forming helix, sheet and coil that are obtained from the first stage SVM are then used in the second stage SVM for performing structure-to-structure prediction. The two-stage cascaded classifiers (PSP_MCSVM) are trained with proteins from RS126 dataset. The classifiers are finally tested on target proteins of critical assessment of protein structure prediction experiment-9 (CASP9). PSP_MCSVM with brainstorming consensus procedure performs better than the prediction servers like Predator, DSC, SIMPA96, for randomly selected proteins from CASP9 targets. The overall performance is found to be comparable with the current state-of-the art. PSP_MCSVM source code, train-test datasets and supplementary files are available freely in public domain at: http://sysbio.icm.edu.pl/secstruct and http://code.google.com/p/cmater-bioinfo

    LipocalinPred: a SVM-based method for prediction of lipocalins

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Functional annotation of rapidly amassing nucleotide and protein sequences presents a challenging task for modern bioinformatics. This is particularly true for protein families sharing extremely low sequence identity, as for lipocalins, a family of proteins with varied functions and great diversity at the sequence level, yet conserved structures.</p> <p>Results</p> <p>In the present study we propose a SVM based method for identification of lipocalin protein sequences. The SVM models were trained with the input features generated using amino acid, dipeptide and secondary structure compositions as well as PSSM profiles. The model derived using both PSSM and secondary structure emerged as the best model in the study. Apart from achieving a high prediction accuracy (>90% in leave-one-out), lipocalinpred correctly differentiates closely related fatty acid-binding proteins and triabins as non-lipocalins.</p> <p>Conclusion</p> <p>The method offers a promising approach as a lipocalin prediction tool, complementing PROSITE, Pfam and homology modelling methods.</p
    corecore